Functional Water Networks in Fully Hydrated Photosystem II

Water channels and networks within photosystem II (PSII) of oxygenic photosynthesis are critical for enzyme structure and function. They control substrate delivery to the oxygen-evolving center and mediate proton transfer at both the oxidative and reductive endpoints. Current views on PSII hydration are derived from protein crystallography, but structural information may be compromised by sample dehydration and technical limitations. Here, we simulate the physiological hydration structure of a cyanobacterial PSII model following a thorough hydration procedure and large-scale unconstrained all-atom molecular dynamics enabled by massively parallel simulations. We show that crystallographic models of PSII are moderately to severely dehydrated and that this problem is particularly acute for models derived from X-ray free electron laser (XFEL) serial femtosecond crystallography. We present a fully hydrated representation of cyanobacterial PSII and map all water channels, both static and dynamic, associated with the electron donor and acceptor sides. Among them, we describe a series of transient channels and the attendant conformational gating role of protein components. On the acceptor side, we characterize a channel system that is absent from existing crystallographic models but is likely functionally important for the reduction of the terminal electron acceptor plastoquinone QB. The results of the present work build a foundation for properly (re)evaluating crystallographic models and for eliciting new insights into PSII structure and function.


System Preparation
The lipid-bilayer embedded model of Photosystem II is based on the high-resolution dimeric crystal structure (1.9 Å) of Thermosynechococcus vulcanus (PDB ID: 3WU2). 1 In the current setup we choose one of the monomers to build the entire system and first aligned it in a way such that the z-axis is normal to the membrane place. This is achieved using the OPM (Orientation of Protein in Membranes) web server. 2 All structurally resolved detergents used for the crystallization were removed. All the missing terminal peptide chains were completed and modelled using the Modeller 9.21 suite 3-5 and the respective best models were selected based on the scoring functions of obtained various models. The missing atoms of amino acids were completed using the pdb4amber module of AmberTools19, 6-8 and the protonation state of the residues and the co-factors were computed using the reduce module. 9 The missing or unassigned ligands or co-factors were inserted using the information from other cyanobacterial crystal structures, i.e. PDB ID 2AXT 10 and 4V62. 11 Some crystallographically resolved cofactors were found to have missing atoms, which were completed using the pymol suite 12 and atoms were re-numbered accordingly. All crystal waters associated with the monomer were retained during the set-up and the positions of additional waters was predicted using 3D-RISM (Three-Dimensional Reference Interaction Site Model) 13-18 calculations available in AmberTools19 (using rism3d.snglpnt). We employed the SPC water model 19 during the 3D-RISM calculations. Previous studies 15-16,20-23 have shown the capabilities of the 3D-RISM technique in predicting deeply buried water solvation site in bio-molecular systems. Results from the 3D-RISM calculations were further utilized to predict the explicit positions of the waters molecules using the placevent module. 20 A total of 5404 waters were added inside and around the protein. The complete protein-cavity water complex is placed inside a POPC (1-palmitoyl-2-oleoylsn-glycero-3-phosphocholine) bilayer of dimension 176 x 176 x 160 Å 3 using the Packmol-Memgen module 24-25 available with AmberTools19. A total of 397 and 393 lipids were added in the upper and lower leaflet, respectively. In addition, the box is filled up with water using Packmol-Memgen, up to a distance of 17.5 Å above and below the protein, resulting in a total of 111,066 water molecules in the simulation box. A distance tolerance of 2 Å is used when placing the membrane and water to avoid clashes during the minimization and equilibration process. We added 304 Na + and 269 Clatoms to neutralize the complete system and maintain a salt concentration (NaCl) of 0.15 M in order to mimic the physiological conditions. The entire system consists of 512,341 atoms.

Parameters for Protein, cofactors, lipid bilayer and water
The electrostatic charges for the cofactors and metal sites were computed based on the MK-RESP (Merz-Kollman Restrained Electrostatic Potential) methodology. 26-27 For the organic cofactors, first the hydrogens atoms were optimized at the B3LYP/def2-SVP level [28][29] and then single-point calculations were performed at the HF/6-31G* level of theory 26,30-31 using the ORCA program 32 and RESP fitting of the charges was performed using the Multiwfn code. 33 A bonded model is employed for the computation of the RESP charges on the OEC (Mn4CaO5 -Oxygen Evolving Complex) and NHI (non-heme iron) sites. As a first step, a small cluster model is built around the metal sites including the side chains of the residues which directly coordinate the metal site.

S-3
The OEC is modeled in its S1 state of the Kok-Joliot cycle, i.e. the oxidation states are Mn1(III)-Mn2(IV)-Mn3(IV)-Mn4(III) and involved ligands are Asp170, Glu354, Ala344, Asp342, Glu189, His332, Glu333, and four H2O molecules. Similarly, the NHI site is modelled as Fe(II) with the ligands HCO3 -, His214, His268, His215, and His272. For RESP fitting, the charges were only computed for Fe(II) and HCO3as a single unit, due to underestimation of Fe(II) charges when the complete cluster is considered. The remaining ligating histidine residues were given the corresponding charges from the AMBER force-field library. These models were first optimized at B3LYP/def2-TZVP and then RESP fitting is performed at B3LYP/6-31G* level. More importantly, we restrained the charge of the backbone atoms of the residues according to the original AMBER force field 31 as such a procedure is known to produce better back-bone dynamics during the simulation. 34 The RESP charges of the chlorophylls and the heme iron site were calculated in a similar fashion. The chlorophylls and the heme-iron are ligated axially to amino acids and water molecules, wherever applicable. For example, PD1 and PD2 of the reaction center are axially ligated to histidine residues and ChlD1 and ChlD2 are axially ligated to a single water molecule. Similarly, both heme sites are bound axially with two histidine residues.
The standard protein residues were described using the Amber14SB force-field 35 and water is modelled using the TIP3P model. 36 We used GAFF2 (General Amber Force Field) 37 for the organic cofactors, where the appropriate atom types were automatically generated with the ANTECHAMBER module of the AmberTools19. We employed the LIPID17 force field 38-39 for the POPC bilayer. Earlier experimental investigations 40 found phosphatidylcholine group based bilayer active in the oxygen evolution. Several previous molecular dynamics simulation have also successfully followed the same approach of using the POPC lipid bilayer. [41][42] The parameters for the chlorophyll a and heme iron site were obtained directly from the literature. [43][44] Customized bonded parameters were used for the modeling of the OEC and NHI site, for example the bonds within the OEC were assigned a value of 150 kcal mol -1 Å -2 , whereas the bonds between the OEC and coordinating residues were assigned a value of 70 kcal mol -1 Å -2 . Similarly, the angles within the OEC were restrained with a value of 120 kcal mol -1 rad -2 and angles between the OEC and coordinating residues with a value of 70 kcal mol -1 rad -2 . In addition, parameters previously reported by Ishikita and coworkers 42 for the NHI site were imported in our model. The non-bonded parameters for the metals were based on their oxidation state using the Ion-Oxygen Distance (IOD) dataset available for the TIP3P model. 34,[45][46] For Na + and Cl -, we used the Joung-Cheatham parameters [47][48] compatible with the TIP3P model.

Minimization
The system was minimized systematically and thoroughly in order to remove energetically unfavourable clashes inside the system. All the minimization procedure is performed on the CPU version of the pmemd engine. S-4 Step 1. All hydrogen atoms were optimized for a total of 1500 steps, involving 50% each of steepest descent and conjugate gradient with a restraint weight of 50 kcal mol -1 Å -2 on all the heavy atoms (i.e. non-hydrogen atoms).
Step 2. The waters, ions and membrane were optimized for 20000 steps involving 50% each of steepest descent and conjugate gradient with a force constant of 50 kcal mol -1 Å -2 on the protein and the cofactors.
Step 3. The complete system is minimized (15000 steps) keeping the Cα atom of amino acids restrained with a force constant of 20 kcal mol -1 Å -2 . Positional restraints used the final minimization step were maintained throughout the equilibration process.

Equilibration and Production Runs
Step 1. The system was slowly heated by increasing from 10 K to 100 K within 5 ps in the NVT ensemble, using the final configuration obtained from the minimization procedure, while maintaining the positional restraints (20 kcal mol -1 Å -2 ) on the Cα atom of amino acids. We employed a slightly larger value of the collision frequency (5 ps -1 ) in the Langevin thermostat. Particle Mesh Ewald (PME) 49 approach is used to treat all electrostatic interactions with a 10 Å cut-off.
Step 2. The temperature of the system is further increased from 100 K to the target temperature (303 K) within 125 ps in the NPT ensemble with a collision frequency of 5 ps -1 of the Langevin thermostat. 50 Step 3. In the next step, we decreased the restraints on the Cα atom in a step-wise manner, i.e. decreasing from 20 to 2 kcal mol -1 Å -2 with an interval of 2 kcal mol -1 Å -2 . We ran 400 ps of NPT simulations for each interval, which totals to 4 ns for the complete procedure. Thereafter, all the restraints were completely removed and the system was simulated for another 2 ns.
Step 4. In the next step, we invoked the MC/MD module 51 of Amber18 with water as a solvent to be exchanged between the bulk and protein interior. The main objectives for using this elegant and expensive technique are: (a) to ensure that the interiors of the protein are properly hydrated during the dynamic evolution of the protein propagated by the standard MD simulation step, and (b) to dehydrate internal cavities in case 3D-RISM over-hydrated the structure. Steric grid size was carefully chosen in order to ensure that entire internal cavities of the protein are covered within. The number of Monte-Carlo (MC) move attempts in each MC cycle was set at 1,000,000, whereas the number of MD steps in each MC cycle was set at 1,000. During all of these computations, we restrained the Cα atoms of the protein to 5 kcal mol -1 Å -2 . After a certain amount of MC cycles, we observed no further "interesting moves" that hydrated or de-hydrated the internal cavities, and thereafter we called off the MC/MD computations. The "hydration-equilibrated" PSII structure obtained after this procedure is an ideal starting point for further simulations.
Step 5. We further equilibrated the entire system for another 61 ns to ensure the POPC bilayer is properly equilibrated. The electron density profile of the POPC lipid bilayer is reported in Figure S1.

S-5
Step 6. We initiated unbiased production run for a total of 200 ns in the NPT ensemble. Frames were saved every 2 ps throughout the production simulations. The stability of the protein is depicted in Figure S2.
During the system heating procedure (Step 1 and Step 2), the temperature is controlled using Langevin Dynamics with a collision frequency of 5 ps -1 , after stabilising the temperature we switched to 1 ps -1 of collision frequency. In all cases, the pressure is regulated anisotropically using the Berendsen Barostat 52 with a pressure relaxation time of 2 ps and maintained at 1 bar. Bonds involving the hydrogen atoms were constrained using the SHAKE algorithm, 53 which allowed us to use an integration time step of 2 fs. Particle Mesh Ewald (PME) approach is used to treat all electrostatic interactions with a 10 Å cut-off. The equilibration and production runs were performed on the CUDA (Compute Unified Device Architecture) version of pmemd engine. [54][55][56] Equilibration and production runs were performed using the Tesla V100 and Quadro RTX 5000 graphics cards.
Analysis of the trajectories of the production simulation were performed using both CPU and the GPU version of the CPPTRAJ. [57][58] For visualization, analysis and image rendering we employed PyMOL 12 and VMD. 59 Figure S1. Electron density profile of the POPC lipid bilayer during the equilibration and production simulations. This profile was generated using the density option in the CPPTRAJ module. Figure S2. Structural evolution of the Cα atoms of the protein during the production simulations. Two data sets were created, i.e. complete protein evolution (red trace) and without the intrinsically disordered part (black trace).

Production Phase Equilibration Phase
Electron Density Profile S-6 Figure S3. Evolution of the water content around the protein (left) and the oxygen evolving complex (right) during the long equilibration phase (~65 ns). The water count is performed using the watershell command of the CUDA enabled CPPTRAJ module of AmberTools19. We used a distance criterion of 2 Å in counting the number of waters around the protein. The number of waters were counted every ~105 ps (due to extremely high computational cost, even on the GPU) in case of the evolution of the water around the complete protein. Figure S4. Distribution of the oxygen atom of the water around the QB pocket obtained using the 3D-RISM calculations. The circled spatial region represents the density of the QB pocket waters. All the hydrogen atoms are omitted for the visual clarity.